Heuristic Word Alignment with Parallel Phrases

نویسنده

  • Maria Holmqvist
چکیده

This paper presents a method for word alignment that uses parallel phrases from manually word aligned sentence pairs to align words in new texts. Experiments on an English–Swedish parallel corpus showed that the heuristic phrase-based method produced word alignments with high precision. Furthermore, alignment recall was improved by generalizing phrases with part-of-speech categories. We also compared the phrase-based method to statistical word alignment and found that a combination of phrase-based and statistical word alignments outperformed pure statistical alignment in terms of Alignment Error Rate (AER).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-precision Word Alignment with Parallel Phrases

We present a method to re-use manual word alignments for alignment of new corpora.

متن کامل

Automatic Phrase Alignment Using statistical n-gram alignment for syntactic phrase alignment

A parallel treebank consists of syntactically annotated sentences in two or more languages, taken from translated (i.e. parallel) documents. These parallel sentences are linked through alignment. Much work has been done on sentence and word alignment, but not as much on the intermediate level. This paper explores using n-gram alignment created for statistical machine translation based on GIZA++...

متن کامل

Improving Phrase Extraction via MBR Phrase Scoring and Pruning

One of the major reasons for translation errors in phrase-based SMT systems is the incorrect phrases induced from inaccuracy word-aligned parallel data. In this paper, we propose a novel approach that uses the minimum Bayes-risk (MBR) principle to improve the accuracy of phrase extraction. Our approach performs as a four-stage pipeline: first, bilingual phrases are extracted from parallel corpu...

متن کامل

An Integrated Tool for Translation-Memory Maintenance

This paper presents an integrated tool to construct and maintain translation-memory for memory-based machine translation. This tool was aimed to automate constructing and validating translation-memory both in word and in phrase levels from English-Thai parallel texts. To align English-Thai words and phrases, the crucial problems that must be resolved include multiple-word-expression boundary am...

متن کامل

Sequence segmentation for statistical machine translation

In the last decade, while statistical machine translation has advanced significantly, there is still much room for further improvements relating to many natural language processing tasks such as word segmentation, word alignment and parsing. Human language is composed of sequences of meaningful units. These sequences can be words, phrases, sentences or even articles serving as basic elements in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010